Communicating Unknown Words in Machine Translation

نویسندگان

  • Matthias Eck
  • Stephan Vogel
  • Alexander H. Waibel
چکیده

A new approach to handle unknown words in machine translation is presented. The basic idea is to find definitions for the unknown words on the source language side and translate those definitions instead. Only monolingual resources are required, which generally offer a broader coverage than bilingual resources and are available for a large number of languages. In order to use this in a machine translation system definitions are extracted automatically from online dictionaries and encyclopedias. The translated definition is then inserted and clearly marked in the original hypothesis. This is shown to lead to significant improvements in (subjective) translation quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Handling Unknown Words in Statistical Machine Translation from a New Perspective

Unknown words are one of the key factors which drastically impact the translation quality. Traditionally, nearly all the related research work focus on obtaining the translation of the unknown words in different ways. In this paper, we propose a new perspective to handle unknown words in statistical machine translation. Instead of trying great effort to find the translation of unknown words, th...

متن کامل

Dealing with unknown words in statistical machine translation

In Statistical Machine Translation, words that were not seen during training are unknown words, that is, words that the system will not know how to translate. In this paper we contribute to this research problem by profiting from orthographic cues given by words. Thus, we report a study of the impact of word distance metrics in cognates’ detection and, in addition, on the possibility of obtaini...

متن کامل

Analogical translation of unknown words in a statistical machine translation framework

In this paper we address the problem of translating unknown words in a statistical machine translation framework. In data-driven machine translation, words that are not seen in the data may not be translated and are either discarded or left as is in the output. They are refered to as unknown words. The unknown word problem increases when the available bilingual data is scarce. In order to addre...

متن کامل

Translating Chinese Unknown Words by Automatically Acquired Templates

In this paper, we present a translation template model to translate Chinese unknown words. The model exploits translation templates, which are extracted automatically from a word-aligned parallel corpus, to translate unknown words. The translation templates are designed in accordance with the structure of unknown words. When an unknown word is detected during translation, the model applies tran...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008